How are property taxes changing over time?

In this data visualization, we will be looking only at homes with the homeowner’s exemption that maintained that exemption from 2007-2016. This means homes that are sold from one primary resident to another. The “clean_data.rmd” notebook contains all the code for cleaning, subsetting, and preparing the data.

Loading libraries and the data:

library(tidyverse)
library(ggplot2)
library(scales)
# library(plotly)

dat = read_rds("./data/compressed_assessors_data_subset.rds")

First let’s look at the data from the most recent year. The total taxable assessments are plotted below with the x axis with and without log scale, because of the long tail of very high taxable assessments. The assessments range from ~$7,500 to ~$14,000,000 with a mean of ~$480,000 and a mean of ~$360,000.

currdat = dat %>% filter(`Closed Roll Year` == 2016)
round(summary(currdat$`Total Taxable Assessment`))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     7586   154442   359064   485878   648759 14313114
p = ggplot(data = currdat) + geom_histogram(aes(x = `Total Taxable Assessment`), 
    binwidth = 1000)
p

p = ggplot(data = currdat) + geom_histogram(aes(x = `Total Taxable Assessment`), 
    binwidth = 0.01) + scale_x_continuous(trans = "log")
p

Here we can see how these numbers have changed over time.

tmp1 = dat %>% group_by(`Closed Roll Year`) %>% summarise_at(vars(`Total Taxable Assessment`), 
    mean, na.rm = TRUE) %>% mutate(statistic = "Mean")

tmp2 = dat %>% group_by(`Closed Roll Year`) %>% summarise_at(vars(`Total Taxable Assessment`), 
    median, na.rm = TRUE) %>% mutate(statistic = "Median")

tmp3 = dat %>% group_by(`Closed Roll Year`) %>% summarise_at(vars(`Total Taxable Assessment`), 
    min, na.rm = TRUE) %>% mutate(statistic = "Min")

tmp4 = dat %>% group_by(`Closed Roll Year`) %>% summarise_at(vars(`Total Taxable Assessment`), 
    max, na.rm = TRUE) %>% mutate(statistic = "Max")

summary_tta = do.call(rbind, list(tmp1, tmp2, tmp3, 
    tmp4))

p = ggplot(data = summary_tta) + geom_line(aes(x = `Closed Roll Year`, 
    y = `Total Taxable Assessment`, group = statistic, 
    color = statistic)) + ggtitle("Total Taxable Assessments over Time") + 
    scale_y_continuous(trans = "log", breaks = function(x) unique(floor(pretty(seq(0, 
        (max(x) + 1) * 1.1)))))
p

p = ggplot(data = summary_tta %>% filter(statistic == 
    "Mean")) + geom_line(aes(x = `Closed Roll Year`, 
    y = `Total Taxable Assessment`)) + ylim(2e+05, 
    5e+05) + ggtitle("Mean Total Taxable Assessment over Time")
p

p = ggplot(data = summary_tta %>% filter(statistic == 
    "Median")) + geom_line(aes(x = `Closed Roll Year`, 
    y = `Total Taxable Assessment`)) + ylim(2e+05, 
    5e+05) + ggtitle("Median Total Taxable Assessment over Time")
p

p = ggplot(data = summary_tta %>% filter(statistic == 
    "Max")) + geom_line(aes(x = `Closed Roll Year`, 
    y = `Total Taxable Assessment`)) + ggtitle("Max Total Taxable Assessment over Time")
p

p = ggplot(data = summary_tta %>% filter(statistic == 
    "Min")) + geom_line(aes(x = `Closed Roll Year`, 
    y = `Total Taxable Assessment`)) + ggtitle("Min Total Taxable Assessment over Time")
p

We can see the trends in these statistics by fitting each with a linear model. The maximum assesssment increases and average of ~$175,000 each year, the minimum assessment slope is calculated at $60 each year, but this number is skewed by the ver low minimum in 2014. Exluding this get us an average of ~$180 per year. The mean assessment and the median assessment increase at ~$10,000 and ~$6000 per year. A next step would be to separate the data further into neighborhoods, and perhaps houses that are sold each year, since it is those larger jumps in value that would be useful to predict for home sellers.

summary_tta %>% group_by(statistic) %>% summarise(`Slope Total Assessment` = lm(`Total Taxable Assessment` ~ 
    `Closed Roll Year`)$coefficients[[2]])
summary_tta %>% filter(`Closed Roll Year` != 2014, 
    statistic == "Min") %>% summarise(`Slope Total Assessment` = lm(`Total Taxable Assessment` ~ 
    `Closed Roll Year`)$coefficients[[2]])

Below are plotted the earliest and latest years on record, to estimate the dates the property was built and last sold. There definitely seems to be an issue with the dates the properties were last sold, as there are large peaks every 10 years, and apparently a lot of missing data in the early 2000s.

p = ggplot(data = currdat %>% filter(!is.na(`Earliest Year`))) + 
    geom_histogram(aes(x = as.numeric(`Earliest Year`)), 
        binwidth = 1) + xlim(1900, 2016)
p

p = ggplot(data = currdat %>% filter(!is.na(`Latest Year`))) + 
    geom_histogram(aes(x = as.numeric(`Latest Year`)), 
        binwidth = 1) + xlim(1900, 2016)
p

Here we can look at the current total taxable Assessments for different neighborhoods in San Francisco in 2016, as well as the average change in total taxable assessments per year. I have plotted them normally and in log scale for easier comparison of the medians of the neighborhoods.

p = ggplot(data = currdat) + geom_boxplot(aes(x = `Analysis Neighborhood`, 
    y = `Total Taxable Assessment`)) + ggtitle("Total Taxable Assessment in 2016") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, 
        vjust = 0.5))
p

p = ggplot(data = currdat) + geom_boxplot(aes(x = `Analysis Neighborhood`, 
    y = `Total Taxable Assessment`)) + ggtitle("Total Taxable Assessment in 2016") + 
    scale_y_continuous(trans = "log", labels = scales::number_format(accuracy = 1)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, 
        vjust = 0.5))
p

p = ggplot(data = currdat) + geom_boxplot(aes(x = `Analysis Neighborhood`, 
    y = `Slope Total Assessment`)) + ggtitle("Average Changes in Total Taxable Assessments per Year") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, 
        vjust = 0.5))
p

p = ggplot(data = currdat) + geom_boxplot(aes(x = `Analysis Neighborhood`, 
    y = `Slope Total Assessment`)) + ggtitle("Average Changes in Total Taxable Assessments per Year") + 
    scale_y_continuous(trans = "log", labels = scales::number_format(accuracy = 1)) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, 
        vjust = 0.5))
p

Here we’ll look in more detail at how the total assessments have changed in different neighborhoods over the years, for properties that are listed as having sold since 2006 and those that are not listed as having sold.

neighborhoods_count = currdat %>% group_by(`Analysis Neighborhood`) %>% 
    summarise(count = n())
neighborhoods = neighborhoods_count %>% filter(count >= 
    1000) %>% select(`Analysis Neighborhood`)
neighborhoods = unique(neighborhoods$`Analysis Neighborhood`)

pAssessTime = function(nhood) {
    # print(nhood)
    currdat2 = dat %>% filter(`Analysis Neighborhood` == 
        nhood)
    p = ggplot(data = currdat2) + geom_line(aes(x = `Closed Roll Year`, 
        y = `Total Taxable Assessment`, group = `Parcel Number`, 
        color = `Earliest Year`)) + scale_color_distiller(palette = "Spectral") + 
        ggtitle(paste(nhood)) + theme_dark() + facet_wrap(~as.factor(`Latest Year` >= 
        2006))
    # facet_wrap(~as.factor(as.numeric(format(`Current
    # Sales Date`,'%Y')) >= 2006))
    return(p)
}

lapply(neighborhoods, pAssessTime)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]


So this shows some interesting patterns. We again see the issue of missing sales dates before 2010. We also see that for homes that were not sold after 2006, there is often a dip in property values starting after 2008 and reaching a minimum around 2011/2012. This is because homeowners can have their homes reassessed if the house loses value, and there was the 2008 financial crisis. If the housing market recovers, the house is reassessed but only to what would be maximally allowed based on the original property assessment. In houses that were sold after 2006, there are often large jumps in the assessment value, even through the financial crisis. The number of houses that maintained a steady property growth rate surprised me. Maybe the depressed property values after 2008 contributed to these properties not gaining greatly in value. Or perhaps the new owners benefitted from transfering their old property assessment, which is allowed by California law in many cases. Maybe the property was sold to a child or grandchild, who would be allowed to keep the lower assessment. It’s also possible that the sold date is incorrect.

The next step here is to identify the actually sold homes by a maximum increase in percent of the total taxable assessment, while excluding those homes that just regained lost value during the financial crisis. This could be done by comparing the 2006 and 2016 assessments and saying those that increased by only the amount allowed by law (~1-2% per year), would not count as sold, even if there was a large increase in value after the recovery from the financial crisis.

Mapping Assessor’s Office Data

Let’s look at homes that increased in value more than they would by the typical amount allowed by law if the property was not sold (~2%). However, we’re still not excluding homes that decreased and then increased values during and after the financial crisis.

This plot shows that there will be very few properties overall that increase more than 2% per year.

p = ggplot(data = currdat %>% filter(`Percent Difference Previous Year` < 
    0.03, `Percent Difference Previous Year` > 0.01)) + 
    geom_histogram(aes(x = `Percent Difference Previous Year`), 
        binwidth = 1e-04) + scale_x_continuous(labels = scales::number_format(accuracy = 0.001))
p

Here are maps of the locations of the properties that increased their taxable assessment by more than 2.25% from the previous year. A next step will be to plot these over a proper map of San Francisco.

currdat = dat %>% filter(`Percent Difference Previous Year` > 
    0.0225)
years = sort(unique(currdat$`Closed Roll Year`))

pAssessPercent = function(year) {
    # print(year)
    currdat2 = currdat %>% filter(`Closed Roll Year` == 
        year)
    p = ggplot() + geom_point(data = currdat2, aes(x = long, 
        y = lat, color = `Percent Difference Previous Year` * 
            100, group = `Property Location`)) + # scale_color_gradientn(trans = 'log', colors =
    # rainbow(9))+
    scale_color_distiller(name = "%", limits = c(2.25, 
        max(currdat$`Percent Difference Previous Year`) * 
            100), palette = "Spectral", trans = "log", 
        labels = scales::number_format(accuracy = 1)) + 
        ggtitle(paste("Percent Difference in Assessment from", 
            year - 1, "to", year)) + coord_map()
    return(p)
}

lapply(years, pAssessPercent)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

Mapping Potential Rent Control

These are all owner occupied properties, as they have the homeowner’s deduction. If the owner rented them out, how many would be covered by rent control? These would be buildings built before 1977.

currdat = dat %>% filter(`Closed Roll Year` == 2016)

p = ggplot() + geom_point(data = currdat, aes(x = long, 
    y = lat, color = as.factor(`Earliest Year` <= 1977), 
    group = `Property Location`)) + scale_color_discrete() + 
    ggtitle("Rent Control Qualified") + coord_map()
p

Many more questions to explore!

1. Who is rent control benefiting?

  • What are the incomes of people who are in rent control?
  • What are the rents of rent control units?
  • Are people in rent control using other city programs?
  • Can I find out if some people who have rent control, own homes elsewhere?
  • Are rent controlled homes more “derelict?”
  • Do rent controlled units rent for a premium?
  • What about people without rent control. Are they more likely to have roomates?
  • Or be wealthier?
  • Or move more often?
  • Does rent control change or correlate with certain behaviors?

2. Of people who own homes:

  • How long have they been there?
  • What taxes are people paying?
  • What are the incomes of people who own?
  • Variables to explore: income, year purchased, taxes paid

3. Homes and apartments in general. What does turnover look like?

  • How often/were are homes being sold?
  • How often/were are homes sold to foreign buyers?
  • How often/where are homes put on AirBnB or VRBO as short term rentals?
  • Whats the historical trend of houses being condo converted?